This page last changed on Feb 04, 2007 by dave.blasby.

Loading GNIS (Geographic Name Information System)

You need 4 datasets:

1. World GNIS

  • About 5,000,000 entries for everywhere but the USA and Antarctica
  • You want the 250mb compressed file from here

2. USA GNIS

  • About 2,000,000 entries in the USA (and protectorates)
  • By state, you need all the states file (ie. "NY_DECI.TXT")
  • Get them from here
  • Do NOT get HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT

3. Antarctica (Optional)

  • Get the zipfile from here

4. World Population:

  • Small dataset with Cities/Country/Region location with population
  • Available here
  • Its a smallish dataset of unknown quality
  • © by Stefan Helders www.world-gazetteer.com

Processing

Setup a Postgis database called "gnis"

1. World GNIS

i) uncompress the 250mb file (it'll be about 750mb)
ii) Use the attached perl script (parse_world.pl) to process the dataset

perl parse_world.pl geonames_dd_dms_date_20050801.txt > all_world.sql

iii) Load into your database
psql gnis -f all_world.sql > all_world.out

iv) load in the gnis_country.dump (attached to this document) file (this links the 2 letter country code to country name. This version has had 'extra' countries added to it.)
pgsql gnis -f gnis_country.dump

v) Add the feature code descriptions (attached to this document)
psql gnis -f  feature_codes.sql

vi) load the sub-national unit names (ie. states/provinces/districts):
psql gnis -f  subnational.sql

v) fluff the data up so that it is joinable with the USA data:
create index gnis_country_indx on gnis_country (country_code);
create index gnis_int_types_indx on gnis_int_types (type_code);
create index sub_nat_indx on gnis_intern_subnational (country_code,subnational_code);

create index gnis_country_indx2 on gnis_country ( bpchar(country_code) );
create index gnis_int_types_indx2 on gnis_int_types (bpchar(type_code));
create index sub_nat_indx2 on gnis_intern_subnational (bpchar(country_code),bpchar(subnational_code));

vacuum analyse gnis_country;
vacuum analyse gnis_intern_subnational;
vacuum analyse gnis_int_types;



alter table gnis_international add column country_name text;
alter table gnis_international add column sub_national text;
alter table gnis_international add column type text;
alter table gnis_international add column full_name_lc text;



update gnis_international set full_name_lc = lower(full_name_nd),
                              country_name = (select name_nd from gnis_country WHERE gnis_country.country_code =country),
                           sub_national = (select name_nd from gnis_intern_subnational WHERE country=country_code AND gnis_intern_subnational.subnational_code =gnis_international.sub_national_code ),
                           type = (select name_nd from gnis_int_types WHERE  gnis_int_types.type_code = gnis_international.feature_designition_code)
                          ;


alter table gnis_international add column est_pop int;
update gnis_international set est_pop = NULL;

2. USA

i) combine all (60) states into one big file:

cat *DECI.TXT > ../ALL_DECI.TXT

NOTE: do NOT include HIST_FEATURES_DECI.TXT or POP_PLACES_DECI.TXT in this!
ii) load into database

perl ../parse_state.pl <ALL_DECI.TXT  > all_state.sql
 psql gnis -f all_state.sql >& all_state.out

NOTE: this converts non ASCII letters (original is likely iso-8859-1 (latin-1)) to the 'best' ASCII representation.

NOTE: the perl script is here parse_state.pl

iii) adjust unique ids

update gnis_usa set uniq_featcode = uniq_featcode+ 10000000;

iv) fluff up for capatibility
alter table gnis_usa add column country text;
alter table gnis_usa add column country_name text;
alter table gnis_usa rename column  statecode to sub_national;
alter table gnis_usa rename column  feature_type to type;
alter table gnis_usa add column full_name_lc text;

update gnis_usa set country = 'US', country_name ='United States',  full_name_lc = lower(full_name_nd);


update gnis_usa set type = 'City/Town' where type ='ppl';

Combine

CREATE table gnis as
SELECT full_name_nd as full_name,
       full_name_lc,
       sub_national,
       country_name,
       country as country_code,
       est_pop,
       type,
       uniq_featcode,
       the_geom
       FROM gnis_international;

;

INSERT into gnis
SELECT  full_name_nd as full_name,
       full_name_lc,
       sub_national,
              country_name,
              country as country_code,
              est_pop,
              type,
              uniq_featcode,
              the_geom
       FROM gnis_usa
;

World Population

perl parse_pop.pl < dataen.txt > out.sql
in postgis:
   \i out.sql
   update gnis_pop set the_geom= setsrid(the_Geom,4326);
   create index gnis_pop_spatial on gnis_pop using gist (the_Geom gist_geometry_ops);
   vacuum analyse;
   insert into geometry_columns values ('','','gnis_pop','the_geom',2,4326,'POINT');

subnational.sql at: http://docs.codehaus.org/download/attachments/43149/subnational.sql
The attachments to this page are available here or hit the "Page Operations" link on the left column (near the bottom).


parse_world.pl (application/octet-stream)
parse_world.pl (application/octet-stream)
gnis_country.dump (application/octet-stream)
parse_int_types.pl (application/octet-stream)
parse_world_subnational.pl (application/octet-stream)
parse_country.pl (application/octet-stream)
feature_codes.sql (application/octet-stream)
parse_state.pl (application/octet-stream)
parse_state.pl (application/octet-stream)
parse_pop.pl (application/octet-stream)
parse_pop.pl (application/octet-stream)
parse_country_pop.pl (application/octet-stream)

When updating the gnis_usa table to determine a city/town a better query would be

update gnis_usa set type = 'City/Town' where type ='ppl' and est_pop > 0;

 because any place associated with people not in a specific category (i.e. school, park, etc.) is marked as ppl.  an example would be subdivisions which typically have their name followed by "(subdivision)" and their type is ppl.

Posted by [email protected] at Jun 06, 2006 15:40
Document generated by Confluence on Jan 16, 2008 23:28